Goto

Collaborating Authors

 North Charleston


Towards Unraveling and Improving Generalization in World Models

Fang, Qiaoyi, Du, Weiyu, Wang, Hang, Zhang, Junshan

arXiv.org Artificial Intelligence

World models have recently emerged as a promising approach to reinforcement learning (RL), achieving state-of-the-art performance across a wide range of visual control tasks. This work aims to obtain a deep understanding of the robustness and generalization capabilities of world models. Thus motivated, we develop a stochastic differential equation formulation by treating the world model learning as a stochastic dynamical system, and characterize the impact of latent representation errors on robustness and generalization, for both cases with zero-drift representation errors and with non-zero-drift representation errors. Our somewhat surprising findings, based on both theoretic and experimental studies, reveal that for the case with zero drift, modest latent representation errors can in fact function as implicit regularization and hence result in improved robustness. We further propose a Jacobian regularization scheme to mitigate the compounding error propagation effects of non-zero drift, thereby enhancing training stability and robustness. Our experimental studies corroborate that this regularization approach not only stabilizes training but also accelerates convergence and improves accuracy of long-horizon prediction.


Limited Voting for Better Representation?

Venema-Los, Maaike, Christoff, Zoé, Grossi, Davide

arXiv.org Artificial Intelligence

Limited Voting (LV) is an approval-based method for multi-winner elections where all ballots are required to have a same fixed size. While it appears to be used as voting method in corporate governance and has some political applications, to the best of our knowledge, no formal analysis of the rule exists to date. We provide such an analysis here, prompted by a request for advice about this voting rule by a health insurance company in the Netherlands, which uses it to elect its work council. We study conditions under which LV would improve representation over standard approval voting and when it would not. We establish the extent of such an improvement, or lack thereof, both in terms of diversity and proportionality notions. These results help us understand if, and how, LV may be used as a low-effort fix of approval voting in order to enhance representation.


Time Series Analysis of Key Societal Events as Reflected in Complex Social Media Data Streams

Skumanich, Andy, Kim, Han Kyul

arXiv.org Artificial Intelligence

Social media platforms hold valuable insights, yet extracting essential information can be challenging. Traditional top-down approaches often struggle to capture critical signals in rapidly changing events. As global events evolve swiftly, social media narratives, including instances of disinformation, become significant sources of insights. To address the need for an inductive strategy, we explore a niche social media platform GAB and an established messaging service Telegram, to develop methodologies applicable on a broader scale. This study investigates narrative evolution on these platforms using quantitative corpus-based discourse analysis techniques. Our approach is a novel mode to study multiple social media domains to distil key information which may be obscured otherwise, allowing for useful and actionable insights. The paper details the technical and methodological aspects of gathering and preprocessing GAB and Telegram data for a keyness (Log Ratio) metric analysis, identifying crucial nouns and verbs for deeper exploration. Empirically, this approach is applied to a case study of a well defined event that had global impact: the 2023 Wagner mutiny. The main findings are: (1) the time line can be deconstructed to provide useful data features allowing for improved interpretation; (2) a methodology is applied which provides a basis for generalization. The key contribution is an approach, that in some cases, provides the ability to capture the dynamic narrative shifts over time with elevated confidence. The approach can augment near-real-time assessment of key social movements, allowing for informed governance choices. This research is important because it lays out a useful methodology for time series relevant info-culling, which can enable proactive modes for positive social engagement.


Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

Ji, Jiabao, Hou, Bairu, Robey, Alexander, Pappas, George J., Hassani, Hamed, Zhang, Yang, Wong, Eric, Chang, Shiyu

arXiv.org Artificial Intelligence

Aligned large language models (LLMs) are vulnerable to jailbreaking attacks, which bypass the safeguards of targeted LLMs and fool them into generating objectionable content. While initial defenses show promise against token-based threat models, there do not exist defenses that provide robustness against semantic attacks and avoid unfavorable trade-offs between robustness and nominal performance. To meet this need, we propose SEMANTICSMOOTH, a smoothing-based defense that aggregates the predictions of multiple semantically transformed copies of a given input prompt. Experimental results demonstrate that SEMANTICSMOOTH achieves state-of-the-art robustness against GCG, PAIR, and AutoDAN attacks while maintaining strong nominal performance on instruction following benchmarks such as InstructionFollowing and AlpacaEval. The codes will be publicly available at https://github.com/UCSB-NLP-Chang/SemanticSmooth.


Coordinated Disclosure for AI: Beyond Security Vulnerabilities

Cattell, Sven, Ghosh, Avijit

arXiv.org Artificial Intelligence

This legal action ignited a heated debate, contributing to a growing series of lawsuits against AI providers [9-11, 54]. This incident underscores the inadequacy of current AI harm reporting mechanisms, leaving small harmed parties with limited recourse unless backed by substantial legal support or media awareness, despite the recognized potential for improving AI systems by exposing issues [78]. Current AI accountability initiatives primarily rely on periodic audits, emphasizing repetitive assessments but lacking a structured reporting framework for user-identified issues post-deployment. This audit-centric paradigm is reflected in influential policies such as the U.S. Executive Order on AI [93], the EU's draft AI Act [43], and New York City's Local Law 144[69]. However, this approach falls short when compared to the more comprehensive Coordinated Vulnerability Disclosure(CVD) processes standard in software security. Coordinated Vulnerability Disclosure (CVD) plays a crucial role as a mechanism for independent researchers to report newly identified vulnerabilities to affected vendors and the public [58]. This process enables transparent remediation before potential exploitation by malicious actors and has become a vital practice enshrined in government regulations and industry standards. Notably, the FDA mandates the implementation of CVD programs for medical device companies to enhance cybersecurity[96]. While CVD has demonstrated effectiveness in traditional software security, its direct application to machine learning (ML) systems faces unique challenges.


The Conditioning Bias in Binary Decision Trees and Random Forests and Its Elimination

Timár, Gábor, Kovács, György

arXiv.org Artificial Intelligence

Decision tree and random forest classification and regression are some of the most widely used in machine learning approaches. Binary decision tree implementations commonly use conditioning in the form 'feature $\leq$ (or $<$) threshold', with the threshold being the midpoint between two observed feature values. In this paper, we investigate the bias introduced by the choice of conditioning operator (an intrinsic property of implementations) in the presence of features with lattice characteristics. We propose techniques to eliminate this bias, requiring an additional prediction with decision trees and incurring no cost for random forests. Using 20 classification and 20 regression datasets, we demonstrate that the bias can lead to statistically significant differences in terms of AUC and $r^2$ scores. The proposed techniques successfully mitigate the bias, compared to the worst-case scenario, statistically significant improvements of up to 0.1-0.2 percentage points of AUC and $r^2$ scores were achieved and the improvement of 1.5 percentage points of $r^2$ score was measured in the most sensitive case of random forest regression. The implementation of the study is available on GitHub at the following repository: \url{https://github.com/gykovacs/conditioning_bias}.


Applying BioBERT to Extract Germline Gene-Disease Associations for Building a Knowledge Graph from the Biomedical Literature

Gonzalez, Armando D. Diaz, Yue, Songhui, Hayes, Sean T., Hughes, Kevin S.

arXiv.org Artificial Intelligence

Published biomedical information has and continues to rapidly increase. The recent advancements in Natural Language Processing (NLP), have generated considerable interest in automating the extraction, normalization, and representation of biomedical knowledge about entities such as genes and diseases. Our study analyzes germline abstracts in the construction of knowledge graphs of the of the immense work that has been done in this area for genes and diseases. This paper presents SimpleGermKG, an automatic knowledge graph construction approach that connects germline genes and diseases. For the extraction of genes and diseases, we employ BioBERT, a pre-trained BERT model on biomedical corpora. We propose an ontology-based and rule-based algorithm to standardize and disambiguate medical terms. For semantic relationships between articles, genes, and diseases, we implemented a part-whole relation approach to connect each entity with its data source and visualize them in a graph-based knowledge representation. Lastly, we discuss the knowledge graph applications, limitations, and challenges to inspire the future research of germline corpora. Our knowledge graph contains 297 genes, 130 diseases, and 46,747 triples. Graph-based visualizations are used to show the results.


Machine Learning in Orbit Estimation: a Survey

Caldas, Francisco, Soares, Cláudia

arXiv.org Artificial Intelligence

Since the late 1950s, when the first artificial satellite was launched, the number of Resident Space Objects has steadily increased. It is estimated that around one million objects larger than one cm are currently orbiting the Earth, with only thirty thousand larger than ten cm being tracked. To avert a chain reaction of collisions, known as Kessler Syndrome, it is essential to accurately track and predict debris and satellites' orbits. Current approximate physics-based methods have errors in the order of kilometers for seven-day predictions, which is insufficient when considering space debris, typically with less than one meter. This failure is usually due to uncertainty around the state of the space object at the beginning of the trajectory, forecasting errors in environmental conditions such as atmospheric drag, and unknown characteristics such as the mass or geometry of the space object. Operators can enhance Orbit Prediction accuracy by deriving unmeasured objects' characteristics and improving non-conservative forces' effects by leveraging data-driven techniques, such as Machine Learning. In this survey, we provide an overview of the work in applying Machine Learning for Orbit Determination, Orbit Prediction, and atmospheric density modeling.


$\alpha$-Rank-Collections: Analyzing Expected Strategic Behavior with Uncertain Utilities

Pieroth, Fabian R., Bichler, Martin

arXiv.org Artificial Intelligence

Game theory largely rests on the availability of cardinal utility functions. In contrast, only ordinal preferences are elicited in fields such as matching under preferences. The literature focuses on mechanisms with simple dominant strategies. However, many real-world applications do not have dominant strategies, so intensities between preferences matter when participants determine their strategies. Even though precise information about cardinal utilities is unavailable, some data about the likelihood of utility functions is typically accessible. We propose to use Bayesian games to formalize uncertainty about decision-makers utilities by viewing them as a collection of normal-form games where uncertainty about types persist in all game stages. Instead of searching for the Bayes-Nash equilibrium, we consider the question of how uncertainty in utilities is reflected in uncertainty of strategic play. We introduce $\alpha$-Rank-collections as a solution concept that extends $\alpha$-Rank, a new solution concept for normal-form games, to Bayesian games. This allows us to analyze the strategic play in, for example, (non-strategyproof) matching markets, for which we do not have appropriate solution concepts so far. $\alpha$-Rank-collections characterize a range of strategy-profiles emerging from replicator dynamics of the game rather than equilibrium point. We prove that $\alpha$-Rank-collections are invariant to positive affine transformations, and that they are efficient to approximate. An instance of the Boston mechanism is used to illustrate the new solution concept.


Annotation Error Detection: Analyzing the Past and Present for a More Coherent Future

Klie, Jan-Christoph, Webber, Bonnie, Gurevych, Iryna

arXiv.org Artificial Intelligence

Annotated data is an essential ingredient in natural language processing for training and evaluating machine learning models. It is therefore very desirable for the annotations to be of high quality. Recent work, however, has shown that several popular datasets contain a surprising amount of annotation errors or inconsistencies. To alleviate this issue, many methods for annotation error detection have been devised over the years. While researchers show that their approaches work well on their newly introduced datasets, they rarely compare their methods to previous work or on the same datasets. This raises strong concerns on methods' general performance and makes it difficult to asses their strengths and weaknesses. We therefore reimplement 18 methods for detecting potential annotation errors and evaluate them on 9 English datasets for text classification as well as token and span labeling. In addition, we define a uniform evaluation setup including a new formalization of the annotation error detection task, evaluation protocol and general best practices. To facilitate future research and reproducibility, we release our datasets and implementations in an easy-to-use and open source software package.